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APPARATUS, SYSTEM, AlW METHOD FOR 
BALANCING LOADS TO NETWORK SERVERS 

Field OF THE Invention . 

5 The present invention relate, in general/tobdancing requests for 

services, or loads, to network servers. 

BACKGROUND piF THE Invention 

In a cOent-servCT compulmg environment, such as Ihe ne^ 
environment of the mtemet, web sites offer a yjiriety of services to the nsers 
10 (clients) via computer programs operating on one or more servers coupled to 
the network. In a simple server implementation of the web site, a single 
: . server hosts the various programs that form 4 web site, and as each request or 
"load" from a client is received at the server, the server perfoiihs the 
requested operation and passes data to the client, thereby satisfying the request 
15 (i.e., downloading text, audio, or video data to the client for display in the 

client's betwork browser program). In this sunple model, difficulties «m arise 
in servicing multiple requests from multiple clients for services from a single 

web site, as the server may not have the processing speed or throughput to 
service each of the multiple requests in a timely fashion. 

20 One conventional approach to address this problem is shown in Fig. 1. 

Fig. 1 illustrates a client-server environment wherein a plurality of servers 20 
is coupled to a network 22, such as the Internet, for providing various services 
from a single web site to one or more clients 24. A load balancing device 26, 
employing a conventional "round-robin" algorithm is provided between the 

25 servers 20 and the network 22. The servers 20 of the web site are configured 
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as redundant servers, each having the same programs thereon to provide the 
same services from the web site to the clients 24. As requests for services are 
received at the web site, the load balancing device 26 passes each new request 
to the next server in a "round-robm" fiishioii. However, such an approach may 
5 still suiBfer from perfonnance difficulties. 

Accordingly, what is needed is an apparatus, system aid methbd for 

balancing requests and loads from cli«its to servers of a web site m a 
computing network. It is against this background thatvarious embodiments of 

the present invention were developed. 

10 Summary OF THE Invention 

According to one broad aspect of the invention, disclosed herein is a 

device, also refetred to herein as a Ipad balancing device/apparatus, for 
determining if a request from a client computing station for a service in a 
network should be processed by a first server adapted to service the request or 
15 by a second seryer adapted to service the request. The device includes a front 
. end module for receiving the request and translating the request into a 

transparent message format, a coordinating module for determining if the first 
server and second servers are actiVe, and at least one load balancing module, 
in communications with the first and second servers, for determining whether 
20 tfie first or second server should service the request, and passing the request to 

tiie appropriate first or second server, as determined thereby. 

la one example, tiie front eud module translates the request into either 

an XML format (extensible markup language) or a binary format. Preferably, 
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the load balancing module* receives a quality of servfc^ 
from the first server and from the second server, and determines whether the 
first or second server should service the request based in part on the metrics. 
As used herein, the term "quality of.service'' (QoS) includes, but is not limited 
5 to, one or more measures or metrics of flie responsiveness of a server in 
satisfying a client' s request for service over a network. QoS and the 
associated metrics provide information or data regardmg the total network 
system response, and maybe affected by, for example, sever loading, network 
loadmg, burst traffic, or the like. 
10 Also, the load balancing module cian obtain the number of pending 

requests at the first server, a number representing the time required to service 
the pending requests by the first server, the number of pending requests at the 
second server, and a number representing the time reqiiired to service the 
pending requests by the second server. In this example, the load balancing 
15 module determines whether the first server should service the request based, at 
least, on the quality of service metrics obtained from the first and second 
servers, the number of pending requests at the first server, the number 
representing the time required to service the pending requests at tte first 
server, the number of pending requests at the second server, and the number 
20 representing the time requijred to service the pending requests at the second 
server. 

The device can also include a first communications mterface to the first 
server for coupling the load balancing module to the first server, a.second 
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communications interfece to the second server for coupling the load balancing 
module to the second server, and i third communications interface reserved 
for the dynamic addition of a third server for coupling the load balancing • 

module to the third server. Additionally, the device can include an additional 
5 load balancing module reserved for the dynamic addition of a new service. 

According to another broad aspect of the mvention, disclosed herein is 
a system for receiving and servicing a request from a client computmg station 
for a service in a network. The system mclude a first server adapted to service 
the request, a second server adapted service tiie request, and a device for 
10 determintag if the request should bi processed by the first server or the sccoiid 
server. Preferably, the device includes a front end module for receiving the. 
request and translating the request into a transparent message format, a 

coordinating module for determining if the. first servCT and second servers are 
active, and at least one load balancing module, in communications with tiie 
15 first and second servers, for determining whether Ae first server should 
service the request, and if so, passing the request to the first server. 

Preferably, the first server and second server each have an input queue 
fo^ tracking the penduig requests to be processed by the first serviw, and each 
mauitain a list of pending requests and a number corresponding to the ^e for 
20 completing each of the pendmg requests. 

Further, the system preferably includes a quality of service agent 
operating at the client, a quaUty of service agent operating on the iBrst server 
adapted to communicate with the quality of sCTvice agent operating at the 
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client, and a quality of service agent operating on the second server also 
adapted to communicate with the quality of service agent operating at the 
client. 

The quality of service agent operating on the first server is also adapted 
5 to communicate with the load balancing module with a message containing 
data of the quality of service betwera the client and the &st server. 
Likewise, the quality of service agent operating on the second server is 
adapted to communicate with the load balancmg module with a message 
containing data of the quality of service between the client and the second 
10 server. The load balancing module determines whether the first server should 
service the request based in part the data of the quality of service between the 
client and the first server and the data of the.quaUty of service between the 
client and the second server. 

According to another broad aspect of the invention, disclosed herein is 
15 a method for distributing a request from a client for a service from a web site 
having a plurality of servers adapted to service the request. The method 
includes receiving the request and detenmnmg if the service requested is 
offered by a first server and a second server of the jpluraljty of servers. In one 
embodiment, a **quality of service" (QoS) metric is obtained from the first 
20 server, and a quality of service metric is obtained from the second server. 
Based, at least, on the quality of service metrics obtamed from tiie first and 
second servers, a determinatipn is made whether the first server should service 
the request, and if so, the request is passed to the first server. 
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. ilie quality of seiYice metric can takes the foim of a me^^^ 
perfonnance being provided by a particular server to a client, for instance 
during the duration of the service period (i.e*, durmg the transmission of data 
from the server to the client). In one example, a client agent operating on the 
5 client is provided, and a server agent operating on the first serVer is.provided. 
In one example, the client agent transmit a message to the servo: agent, tiie 
message containing a data rate of data transferred from the client to the first 
. server, wherein the data rate is used as a quality of service metric of the first 
server. This data is used to determme which server should service the request 
10 of the client. . 

In another embodiment, the number of pending requests at the first 
server is obtained, as is the time (estimated, actual, or empirical) required to 
service the pending requests by the first server. Sunilarly, the number of 
pending requests at the second server is obtained, along with the time required 
15 to service the pending requests by the second server- The determining step 
then determmes whether the first server should service the request based, at 
least, on the quality of service metrics obtained from the first and second 
servers, the number of pending requests at the first server, the time required to 
service the pending requests at the first server, the number of pending requests 
20 . at the second server, and the tune required to service the pending requests at 
the second server. Agam, the time required can be an estimated time, actual 
. time, or eimpirical time. 
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The quality of service metrics, and other performance chiiracteristics of 
the system, can be remotely accessed if desired. 

In another embodiment, the client's request is translated iiito a 
transparent message format usable by the first and second server, such as 
5 XML format With a start designator identifying the beginning of liie message, 
a type designator identifying the type of service, and an end designator 
indicating the end of the message. Binary format c^ also be used. 

Furthermore, the method of the present invention permits dynamic 
additions of additional servers to the system. Upon an addition of a third new 
10 server to the web site, the presence ofthe new third server is detected and it is 

determined if the new third server offers the service requested by the client A 
quality of service metric is obtained from tiie new third server and is included 
in the determination of whether the first server should service the request. 
Moreover, the method of the present invention permits dynamic 

15 addition of a new service on either an existing server or a new server to the 
systeih. Upon addition of a new service to the web site, the presence of the 
new servicers detected whereupon the load balancer offers the service 
requested by a cUent. A quaUty of service metric is obtained firorii the server 
managing the new service and is included in the determination of whether the 

20 server offering the new service should receive and process the client request 
The foregoing and other features,.irtilities and advantages ofthe . 
inverition will be apparent from the foUoWing more particular description of a 
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preferred embodiment of the invention as illustrated in ft^ 

drawings and clsums. 

BMEF DESCMPTION OF THE DRAWINGS 
Fig. 1 illustrates a block diagram of conventional client-server system 

.5 having a load balancing device utilizing a conventional «*round-robin- 
algorithm for balancing loads in a network such as the Internet 

Fig. 2 illustrates a block diagram of one embodiment of the present 

invention. 

Fig. 3 illustrates a distribution of requests/loads internal to a seryer in 

10 accordance with one embodiment ofthe present invention. 

Fig. .4 illustrates an example of the logical operations performed by the 

load balancer inaccordance with one embodiment ofthe present invention. 

Fig. 5 Ulustrates an example of the logical operations performed by a 
server hi accordance wife one embodiment of the present invention. 
15 DETAH^D DESCRIPTION OF THE PREFERRI^ EMBODIMENT 

In accordance witii the present invention, a load balancing apparatus 
and method therefor, as weU as a system usmg the same, is disclosed hereui. 
In particular, the load balancmg apparatus, referred to variously herefai as a 

"load balancer" or a "load balancing device" employs a unique and novel set 
20 of decision criteria m determining which server coupled thereto should receive 
and process a request or "load" from a client over the network. 

Refenuig now to Fig. 2, a load balancer 30 in accordance with one 
embodiment ofthe present invention is shown. The balancer 30 is an interface. 
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. i^e<mthenetwork32andapluralityofservers.34 

34A. 34B are shown in the example of Fig. 2). E*ch server 34A3 ^ a set of 
• software programs, such as "AppF and «App2» shown in Fig. 2, for providing 

the services offered by the website as requested by one or more clients 36 in 

5 the network. Each server:34A3 also has an input queue therein, as will be 

described later with reference to Fig. 3. 

In particular, server 34A provides service "Appl" and service «App2'', 

while server 34B provides service «App2.'' As v«U be discussed in greater 
detsul below, m response to a request from client 36 for service «App2", the 

10 load balancer 30 of the present invention determines whether serVer 34A or 
server 34B should process the client's request for service «App2», and upon 
such determination, the load balancer passes the request for service "App2- to 

the selected server. 

In accordance with the present mvention. the load balancer 30 shown in 
15 Fig. 2 has knowledge of which appUcations CAppl. «App2") are loaded on 
which servers 34A3. so that the load balancer 30 can pass the client's 36 
request for a particular service to the appropriate server or set of servers. 
Furthermore, the load balancer 30 has knowledge of various server specific: 

. "metrics" which are criteria used by the load balancer 30 to determine which 
20 server shoild service a pending request from a client 36. In one example, the 
load balancer 30 of the present invention receives mformation froiii each 
server 34A,B relating to the iiumber of pending jobs in the input queue of that 
server, as weU as the time required for each job to be completed by that 
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sterver. Further; the infomation from the server 34A3 may i^^^^ 
utilization of CPU cycles, percent utilization of network, and other .metrics 
. that the server can coUect from its own operating system registry. In this 
manner, the load balancer 30 can compute a numerical metric which is the 
5 product of the number of pending jobs in the input queue of a server, 

multiplied by the time required to complete each job. Accordingly, Ae load 
balancer 30 then has information relating to the ability of a particular server to 
process an incoming request or load from a cUent 36. In another example, the 
load balancer also receives a -quality of service" value from monitoring 
10 processes running on the cUent and server platforms, described below. 

For example, if the number of jobs pending in the queue of server 34B 
is one, and the tune required to complete a job is approximately 1.5 seconds, 
then in this example thd metric would be 1.5. In contrast, if server 34A had 
three jobs pending in its queue and required 1.0 seoond tb complete a job, then 
15 the metric for server 34A would be three. Accordingly, for an incoming 

request for services or load from a client 36, the load balancer 30 would pass 
the next incoming request to server 34B, as its metric indicates that 
server 34B is more available to handle the incoming request than is 

. server 34A. 

.20 The load balanctar 30 as shown in Fig. 2 has a number of processes and 

interfeces in accordance with one embodiment ofthe present invention. A 
CGI front-end process 40 is provided for recdving data from a cUent 
application or network browser 42, and converting the data mto a desired : 
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format. The CGI fiont-end process 40 is associated with one or more services 
provided by application servers 34A,B, assuming that the application servers 
34A,B have previously "registered" with ihe load balancer (registration is 
described below). In one example, the incoming data is converted by the CGI 
5 front end process 40 from a plurality of client formats, into a format 

compatible with the requested application and compatible with the remaining 
load balancing processes. In one example, the CGI front-end process 40 
converts the message format into a "transparent messaging** format usable by 
the load balancing processes. Transparent messagmg enables the various 
.10 • internal processeis of the load balancer to route and load balance network 

requests without knowmg the content of each message itself. In this way, the 
message content is transparent to the load balancing system. 

The format of a transparent message is an encapsulation in which the 
application specific data is encapsulated in the payload or central portion of 
15 the message. Around the payload is a start and type designator in the front 
portion of the message, and a stop designator/identifier at the back portion of 
the message. Two embodiments of the transparent messages are shown in 
Table 1 and 2. The first embodiment is in a binary format for eflBciency and 
compatibility with binary data. The second embodiment.uses the industry 
20 standard XML (extensible markup language) ASCD-based notations for use 
with strictly ASCII mefssages and for extensibility of future applications. 
Table 1. Format example of binary based transparent messages. 

Byte: 0 4 5 6.. N N+4 

Data; START <type byte> <varial5le nuniber of bytes> STOP 
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Table 2. Format example of XML based transparent messages. 

<?xmi version="1.0"?> 
•<Mes.sageContent> . 

<ServiceName>type designator</ServiceNaiae> 
5 Application Data ... 

. </MessageContent> 

In the binary format shown in Table 1, the start designator is the 
hexadecimal equivalent of the character string value "START.** The stop 
designator is the hexadecimal equivalent value of the character string value 
10 "STOP." In one example, the type byte is an 8 bit value that is registered with 

CGI front-end process 40 and represents the type of application service 
. . requested, conesponding for example to "Appr or "App2'' shown in Fig. 
The remaining message content is formatted for the specific apiplication 
service bemg requested; Using this technique, the load balancer 30 does not 
15 need to understand the content of the message to be able to forward the 

message to a compatible application server 34A3 that is avaUable and which 
can most efficiently process the load. 

In the XML format shown in table 2, the start designator is the 
compatible string '^<ServiceName>''. The end designator is the XML 
20 compatible string «</ServiceNam6>'^. Hie type designator is an XML 

. compatible string located between the start and stop designators without extra 
spaces. In this format, the start and stop designators identify the location of 
the type designator within a compatible XML message. Using this technique, 
the load balancer 30 does not need to understand the content of the message in 
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order to be able to forward the message to a compatible application server 
34A3, 

The use of XML compatible message formatting provides an extensible . 

and simple method of adding new serVices to the load balancing system 

5 without requiring changes to the load balancing algorithms, software or 

processes employed. Further, providing transparent messaging results in 

greater speed and efficiency and eliminates any need for re-compiling due to 

changes in the content of application's messages; 

After formatting by the CGI front-end 40, the transparent message is 

10 passed to the Main Coordinator, process 44, referred to also herein as a 

"coordinator module," of the load balancer 30 for decoding and forwarding to 
the appropriate load balancing module/process 46. 48, and ultimately to the 
appropriate selected server through the appropriate communications thread 50, 
52, or 54. 

15 The Main Coordinator process 44 decodes Ae type designator to 

identify to which load balancing module 46, 48 the message should be 
forwarded. The Main Coordinator process 44 maintains a list 55 of the load 
baljmcing modules 46, 48 and theu: associated appUcations (i.e., "Appl", 
"App2"). If the client's 36 request refers to a service that is to be provided by 

20 one of the plurality of servers 34A3 coupled to the load balancer 30, then the 
Main Coordinator process 44 passes the request to the appropriate load 
balanciiig module 46, 48 for further processing. Otherwise, if the Main 
Coordinator process 44 determines that the client's 36 request is not addressed 
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to one of the plurality of servers 34A,B coupled to the load balancer 30,in one 
example, the Mam Coordinator process 44 replies to the client's request with 
a "service not available" message. 

The load balancing modules 46, 48 determine which server should 

5 service the client's request based on various metrics relating to each server 
34A,B, such as quality of service, number of pending jobs, or other decision 
criteria discussed herein or with respect to Fig. 4, or any combination thereof! 
For example, the load balancing module 48 pf Fig. 2 determines whether a 
request for **App2'* service should be processed by server 34A or server 34B. 
10 Upon determining which server should process the service request, the load 
balancing process 48 forwards the request to the chosen server. 

The load balancing modules 46, 48 are coupled to each server 34A,B 
through a plurality of conmiunication interfaces, shown as threads 50, 52, 54, 
with corresponding threads 56, 58, 60 at the servers 34A,B. 
15 As previously described, one example of the metrics includes a 

calculation of the number of pending jobs in.a particular server's input queue 
multiplied by the tune required by the server .to complete each job, shown as 
"Server Stats" 62 in Fig, 2. Alternatively, the load balancing decision process 
can account for a quality of service (QoS) figure, described below, in makmg 
20 its determination. Upon determining to which server 34A,B the client's 

request should be passed, the loiad balancing module 46, 48 then forwards the 
. client's request through the proper communication interface to the appropriate 
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server. The server 34XB th«m places the requert in its input queue as 

desmbed below with reference to Fig. 3. 

It will be understood that while the load balancing decision process is 

described as a portion of the functionality of the load balancing module 
5 implemented in various processes 40. 44. 46.48, such functionality can be 
combined or subdivided or otherwise arranged differently, and may reside it a 
portal or the like, and incorporated therein- 
Further m accordance with one embodiment of the present invention, 
the quality of service "QoS" figure is provided and tracked throughout the 
10 system and prbvides valuable mformation to the loid balancer 30 m maldng 
its determination as to which server 34A,B should process a client's request. 
. . In one example, and as shown in Fig. 2, quality of service agents 70, 72 are 
operated on each of the plurality of application servers 34A.B and quality of 
services agents 74 are operated on each ofthecUent 36 platforms, and 

15 communicate QoS messages such as message 75 shown in Fig. 2. The QoS . 
agents on the application server and the client communicate to each other over 
ibe diiration of the provided service. Each agent sends QoS messages to the 
other respective agent, essentially reflecting b^fc to the sending side what the 

receiving side is seeing in terms of network performance. In one example, die. 
20 messages between the respective QoS agents contain a QoS agent 
. identification number, sequence counter, time stamp, and other status 
mformation such as CPU percent utUiiation. average data rate, bits 
. transfened,eto. The QoS agent 74 operated by the client communicates with 
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the respective QoS agent 70, 72 operated by the server via these status 
messages, intrinsically measuring the quaUty of the network path there 
between. The QgS agent 70, 72 operated by the application server 34A,B 
suppUes its performance metrics back to the load balancer 30 via a QoS 
5 message, which is application sbrver dependent. This QoS message is fed 

back to the load balancer 30 at the beginning of each new requestrtoad, or 
more often if desired. Each application server's 34A,B QoS messages are 

used by the load 30 balancer in its decision process of determining the best or 
most appropriate application server to handle a new request or load. By 
10 sending the QoS messages at a regular rate or "pinr* the instantaneous and 
average network performance can be gauged by computing the latency of the 

messages as well as the variance in message latency. Furthermore, the status 
faiformation provides a measure of the loads at various points in the networked 
system, jfrom the clients to the application servers. 
15 For mstance, if a client user 36 was receiving a veiy good response 

time for file downloads, then the status information received from the client 
pings or messages would show a greatly increasing number .of bytes 
transferred and a high average data rate. This would indicate that the 

respective application server 34A or 34B was performing well. However, the 
20 variance of ping latencies could be high indicating a large amount of burst 
data on the associated communications path- In the case of a file download, 
the load balancer 30 may ignore the latency variance due to the inherent 
variable data rate nature of a file download. In contrast, servicw such as 
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. streamirig voice or video would be very sensitive to latency variation. In this 
case, a large variance in latency wonld. for example, trigger the load 
. balancer 30 to reroute future requests to other servers, possibly using 

alternative communicatioiis paths. 

5. Metrics computed from the data and status values in the QoS messages 

can be used in place of or in combination wiA the queue metric described 
above. Forinstance.averagedatarateofaservercanbedividedbytheCPU 
percent utilization. This would yield a mettic indicating the performance of 
. the server, by which the plurality of servers 34A.B could be ranted by this 

10 metric. In this example, i new client request would be routed to the server 

with the highest ranking. Alternatively, the described metric could be 
mathematicaUy divided by the variance of the latency calculated fiom .the ping 

rate variance. In this approach, a high variance would reduce the ranking of a 
server, with a high variance resultmg in a new distribution of message 

15 In this sense, the QoS agents 70, 72 and 74 provide feedback to the load 

balancer 30 as to how well the services are sent by the respective server 34AiB 
and being received by the end user at the client station 36. 

Referring to Fig. 3. and in accordance with Ae present invention, a 
server implementation is shown for a server, such as server 34A of Kg. 2.. 

20 coupled to the load balancer 30 of tiie present invention. The server 34A has 
an input queue. 80 for storing requeste for services, and a job statistics table 82 

for storing data relating to the swver metrics. The input queue 80 can be 

fanplemented as a global queue for all mcoming requests, .or as a set of local . 
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input queues, each associated with a particular application (such as «App 1" or 
"App 2") provided by the server 34A. For the «App 1" application serviced by 
the server 34A, the server has a front end process/thread 84 and a plurality of 
processes 86A,B,C for servicing the requests placed in the respective locations 
5 of the queue. Similarly for the "App 2" application serviced by server 34f, a 
front end process/thread 88 and process 90 is provided. 

Fig. 3 will be described with respect to a request for "App 1" service 
through front-end process 84. As a request from the load balancer is received, 
the appropriate front end thread/process 84 receives the request from the load 
IQ balancer and places the request in the input queue 80. In one example, the 
input queue 80 is a circular queue having N entries, such that the front end 
thread/process 84 places an incoming request into the next available location 
id the input queue 80. If the input queue 80 is fidl, then the front end 
thread 84 of the server communicates to the load balance:, as part of the 
15 server metrics, that the input queue 80 is ''full." In response, the load balancer 
avoids passing my further request.to the particular server with the full input 

queue until the load balancer receives a subsequent message that the input 
queue 80 of Ae server is again available to accept and process new requests. 
The "worker processes" 86A.B,C illustrated in Fig. 3 receive tasks to 
20 perform from an entry on the input queue 80. Each of these processes 
: 86A3,C is an racecutable image providing one of the services that may be 
requested by a client. When a process 86AJB,C has completed a requested 
service, it enters an idle state where it waits and periodically checks the iilput 
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Tffhere is a service request in the queue a.e., 
qtteuefotanewsetvic.req«est.Ifthereisas 

,.,.ueisuo.e.p.),.eprocessS.A3.Cco^^^ 
.. .equeuee..r,.oru.ecUe.user..eie.s.i.queue.e^.ana.^^^^^ 

,..er«...eservi.reques.e.T.eae.e^ 
3 ...esiotisavaiia...rsc.eaui.S..euiusa^ 
.aprocessS4.A«eroo.pletiu..erequesteaservice.^.^^ 
.oesbaCintoidlemodetolookforaneweutryinqueueSO,. . 

. ^eeeuerailo.c.o.o..eioa..a«^^ 

. V ,„Fi«s4aud5respectively.RefeningtoFig.4.thelogxcal 
method is shown in Figs. 4 ano 3 1 v 
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await a new message. 
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indicated by being atthetop of the sorted metric array described with 
reference to operation 106. After completion of operations 120, 122. control 

is passed to the idle state 100. 

In Fig. 5, tme exaMple of the logical operations associated with the. 

5 application server input queue 80 of Fig. 3 is illustrated. Preferably, the front- 
end process 84, 88 of Fig. 3 performs these queue processing operations. 
Operation 130. is the idle state of the front-end process, where the front-end 

waits for a message or operation to be performed. Upon receiving an 
incoming message, the front-end determmes whether it is a request for service 
.10 message in operation 132. If it is not a request for servicie message, then the 
message is discarded and the process returns to the idle state in operation 130. 
Operation 134 determmes if there is room in tiie input queue for a new service 
r equest. If there is room, an estimate of the time to complete tiie requested 
service is made in operation 136. The running average of request service 

15 times, is calculated in operation 138. This running average can be calculated 
by at least two.methods. First, for.a batch service request, such as a streaming 

video broadcast, the average is calculated as tiie total amount of time reqmred 
to process all requests at tiie server, divided by tiie total number of requests 
pending. Second, for an interactive service request, such as a software 

20 service, tiie average is calculated as tiie total time of some previous number of 
completed requests, divided by the number of previous requests. The service 

request and tiie time estimated are stored in tiie next open queue slot in 
operation 140. The processing loop is finished by decrementing tiie count of 
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available qaeue slots in operation 142, winding the time statistics (including 
the estimated, average times or empirically derived time for completion) to the 
load balancer in operation 146, sending the number of pending requests for 
service in operation 146, and returning to idle state 130 and the next message. 
5 In this manner, the server has commimicated to the load balancCT t^^ 

respectivemetrics for the server, in accordance with the present inventioiu 

If the queue is found, to be fiill in operation 134, then the load balancer 
is notified of a full queue in operation 148. As described above, the loid 
balancer will suspend sending any messages to this server until the queue 
10 opens up. In Ais example, the front-end process waits a programmed time m 
operation 150 before checking the queue again m operation 152. The frOnt- 
end process loops between operation 150 and .152 until a slot in the queue 
becomes available. When a slot becomes available, a message indicating that 
the queue is avaUable is sent to tiie load balancer m operation 154. The front 
15 end process then returns to the idle state 130 until another message is 
received. 

Furtherinore, in accordance wilii the.present invention, the load 
balancer provides a remote monitoring capabUity. In one example, each 
server communicates the average time to service requests and the number of 
20 pending. jobs to the load balancer. In effect, this operation concentrates the 
Server load metrics for the entire network at the load balancer. A remote 

"dial-in" process could gatiier tiie load metrics from one or more of the load 
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balancers in a network to obtain a global view on the performance and load on 

the entire network. 

Furthermore, in accordance with the present invention, the load 
balancer is adapted to recognize, on a dynamic basis, the addition of a new 

5 server or the replacement of an. existing server. The discovery, identification 
and coordination of the swer pool are performed through a dynamic 
communications system. In general, the load balancer initiates or offers one 
additional communications channel at all times, shown for example in Fig. 2 
as 51 or 57. In one example, when a new server is attached to the network, 

10 the new server makes a request to send a message to the load balancer and as a 
result, finds or discovers the additional channel of a load, balancer. This 
allows the new server to uniquely identify itself to the load balancer and 
coordinate communications. The load balancer, in response to a message over 
the additional channel, gathers the new server's information and adds a new 

15 slot m the server statistics table. After the new server has been recorded by 
the load balancer, a new additional channel is opened and maintained until the 
server expressly terminates communications with the load balancer, or is 

- otherwise determined to be absent. 

Furthermore, in accordance with the present, invention, the load 

20 balancer 30 is adapted to recognize, on a dynamic basis, the addition of a new 
. service in association with either an existing server.or a new server. The new 
semce can be, for example, a new capability, fimction or.utility performed by 

a server. The discovery, identification and coordination of the n^w service are 
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perfonned through a dynamic communication servicfe similar to the 
aforementioned dynamic server coordination,. 

In general and referring to Fig. 2, the Main Coordinator process 44 . 
initiates one additional, generic load balancing process/module 49 to discover 
5 and manage a new service. The generic load balancing module 49 initiates or 
offers a generic communication channel 53. In one example, when a new 
service is attached to the network, the server managing this new service makes 
a request to send a message to the generic load balancing module 49 and as a 
result, finds or discovers the communication channel 53 of the generic load 
10 balancing module 49. A generic naming practice is used to facilitate the 

service to discover the available channel of the generic load balancing module. 
Once a new service has been associated with the generic load balancing 
module 49, the generic load balancing module 49 registers a new service name 
with the Main Coordination process 44 (for example, by using list 55), 
15 changes its name and channel name to reflect the new service, and the Mam 
Coordination process 44 initiates yet another a new generic load balancing 
module (not shown) to replace the recentiy renamed load balancing module 49 
in order to support the dynamic addition of yet another service. 

Generally, the generic load balancing process/module 49^ dynamically 
20 discovers any new servers and operates similar to load balancing 

modules/processes 46, 48 once it has been renamed. As with load balancing 
modules 46, 48, the newly named load balancing module 49 preferably uses 
QoS metrics to decide to which server to send service requests. The name of 
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the load balancing module 49 registered with the Main Coordinator process 44 
can then be used by. client applications to request the new service offered 
thereby. The capability of dynamically adding new services results, in part, . 
jfrom the transparent messaging and QoS metric of embodiments of the present 
. 5 invention. 

In one embodiment, named pipes are used for communications between 
the load balancer and the servers. Alternatively, sockets can be used. In 
. . . either case, a naming convention can be used to assist tiie server in opening a 
communications channel and find the additional channel associated with the 
.10 load balancer. £a one example, the pipe or socket channel will be named after 
the service that is being load balanced. For example, TIUMEDIT_SOCKET 
could be used for the Trim Edit function in video content creation. In another 
example, GENERIC_SERVICE_SOCKET could be used for the generic load 
balancing process/module 49 (shown in Fig. 2).to facilitate the discovery and 
15 dynamic recognition of new services. 

Referring to Fig. 2, the generic load balancing process/module 49 has 
initiated a new named pipe 53 offered to new services. If a new service were 
to be connected to the generic load balancing module 49> the new service 
would make ah open call in software to the generic named pipe 53 resulting in 
20 a connection to the generic load balancing module 49, This action would 
. , initiate the registering ofa new service, the load balancing module 49 would 
begin acceipting requests for the new service^ and the Main Coordinator 
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process 44 would initiate yet another new g«meric load balancing module (not 
shown) to provide for yet another, new service. 

Referring to Fig. 2, the "App 2" load balancing process/module 48 is 

cuiientiy managing two servers 34A^. In this example, in order to support 
5 the dynamic addition of a new third server, the load balancing process 48 

would initiate/maintain a third named pipe 57.. If a new server were to be 
connected to the load balancer, the new serv« would make an open call 
... corresponding to "App 2" in software, resulting in the connection to this third 

named pipe 57. Hiis action wotold add the new server to the server pool and 
10 the load balancer would begin to accept and pass service request messages to 
the new server. Upon completion, the load balancer would initiate/maintain 
yet another new additional named pipe (i.e., fprth named pipe, not shown) to 
provide for the dynamic addition of another (i.e., fourth) new server. 

Hence, embodiments of the present invention permit the dyniamic 

15 addition of new servers to the load balancer 30, or recognize the addition of 
new services provided by the servers, without having to alter or restart the 
load balancer 30. 

This same mechanism can be used to dynamically detect the removal of 
a server. When a server catastrophically goes down or. is shut down 

20 gracefcUy, the application server end of the named pipe or socket closes. This 
is detected by the load balancer and indicates that the server is now 

unavailable. The load balancer can remove the server from its iist andremove 
the named pipe or nse the avMlable channel ifor the new additional diannel. 



wo 01/90903 



27 



PCTAJSOl/16658 



The invention can be embodied in a computer program product. It will 
be understood that the computer program product of the present invention 
preferably is created in a computer usable medium, having computer readable 
code embodied.therein. The computer usable medium preferably contains a 
5 number of computer readable program code devices configured to cause a 
computer to affect the various functions required to cany out the invention/as 
herein described. 

While the embodiments of the invention have been described with 
respect to Figs. 2-5 wherein a single client 36 is shown conmiunicating with 
10 the load balancer 30 coupled to a pair of servers 34A3, wherein server 34A 
offers services "App l** and **App 2** and server 34B offers service "App 2^ it 
will be understood that the present invention will be applicable to various 
computing configurations where the number of clients, load balancers^ servers, 
and services will vary as a matter of choice depending on the particular 
15 implementation. 

The embodiments of the invention described herein are preferably 
implemented as logical operations m a computing system. The logical 
operations of the present invention are unplemwited (1) as a sequence of 
computing implemented steps running on the computing system, or (2) as 
20 mterconnected modules within the computing system. The implementation is 
a matter of choice dependent on the performance requirements of the 
computing system implementing the invention. Accordingly, the logical 
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Operations making up the embodiments of tbe invention described herein are 
referred to variously as operations, steps, or modules. 

While the method disclosed herein has been described and shown with 
reference to particular steps performed in a particular order, it will be 
5 understood that tiiese steps may be combined, sub-divided, or re-ordered to 
form an equivalent method without departing from the teachings of the present 
invention. Accordmgly, unless specifically indicated herein, the order and 
grouping ofthestq)S is not a limitation ofthe present invention. 

The foregoing embodiments and examples are to be considered 
10 illustrative, rather ttian restrictive of the invention, and those modifications, 
which come within the meaning and range of equivalence of the claims, are to 
be included therein. While the invention has been particularly shown and 
described with reference to a preferred embodiment thereof; it will be 
understood by those skilled in the art tiiat various other changes in tiie form 
15 and details may be made without departing from tiie spirit and scope of the 
invention. 
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We claim: 

1. In a computer network, a method for distributing a request from 
a client computing station for a service from a web site having a plurality of 
servers.adapted to service the.request, comprising: 

receiving the request; 

determining if the service requested is offered by a first server and a 
second server of said plurality of servers; 

obtaining a quality of service metric from said first server; . 
5 obtaining a quality of service metric from said second server; 

based, at least, on the quali^ of service metrics obtained from the first 
and second servers, determining whether the first server should service said 
request; axid 

if the determining step determines that the first server should s 
10 the request, passing the request to the first server. 

2. The method of claim 1, further comprising: 

obtaining a number of pending requests at the first server, aid 

obtaming a number representing the time required to service said pending 

. requests by said first server; and 
5. . . obtaining a number ofpendingre<iuests at the second server, and 

. obtaining a number representing the time required to service said.pending 

requests by said second server; 
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wherein said determining steps determines whether the first server, 
should service said request based, at least, on the quality of service metrics 
10 obtained from the first and second servers, the number of pending requests at 
the first server, the number representing the time required to service said 
pending requests at the first server, the number of pending requests at the 
second server, and the number representing the?, time required to service said . 
pending requests at the second server. 

3. The method of claim 1, further comprising: 

providmg a client agent operating on the client computing station; 
providing a server agent operating on the first server; and 
transmitting a message from the client agent to the server agent, said 
.5 message containing a data rate of data transferred from the client computing 

station to the first server, wherein said data rate is used as a quality of service 

metric of said first server. 

4. The method of claim 1, further comprismg: 

translating said request into a transparent message format usable by 
said first and second server. 

5. . The method of claim 1, further comprising: 

translating said request into a transparent message format usable by 
said first and second server, wherein said translation uses XML format with a 
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start designator identifiring the beginning of the message, a type designator 
5 identifying the type of service, and an end designator indicating the end of the 
message. ' . 

6. The method of claim 1, further comprising: 

receiving over the network a remote request for said quality of service! 
metrics as a measure of network performance; and 

communicatmg said quality of service metrics in response to said 
5 remote request. 

7. Themethodof claim 1, further comprising: 

upon an addition of a third new server to the web site, detecting the 
presence, of the new third server; 

determining if the new third server offers the service requested by the 

5 client; 

obtaining a quality of service metric from the new third server; and 
includmg the quality of service metric from the new third server in the 
determination of whether the first server should service the request. 

8. Themethodof claim 1, further comprising: 
upon the addition of a new service on one of said plurality of servers of 

the web site, detecting the presence of the new service; 
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detennming if the new service is the same type of service requested by 
5 the client; 

obtaining a quality of service metric from the server associated -with the 
new service; and 

including the quality of service metric from the server associated with 
the new service in determining whether the finrt server should service the 
10 request. 

9. A device for determining ifa request from a cUent computing 
station for a service in a network should be processed by a first server adapted 
to service the request or by a second server adapted to service the request, 
comprising: 

5 a front end module for receiving the request and translating the request 

into a transparent message format; 

a coordinating module for determining if the first server and second 

servers are active; and 

at least one load balancing module, in communications with said first 
10 and second servers, for determining whether the first server should service 
said request, and if so, passing the request to th? first server. 



10. The device of claim 9, wherein said front end module translates 
said request into an XML format. 
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11. The device of claim 9, wherein said front end module translates 
said request into a binary format 

12. The device of claim 9, wherein said coordinating module 
determines if said request should be passed to the load balancing module. 

13. The device of claim 9, wherein said load balancing module 
receives a quality of service metric from said first server and*om said second 
server, and determines whether the first server should service the request 
based in part on said metrics. , 

14. The device of claim 9, wherein said }oad balancing module 
obtams a number of pending requests at the first server, obtains a number 
represeaiting the time required to service said pending requests by said first 

server, obtains a number of pending requests at the second server, and obtains 
5 a number representing the time required to service said pending requests by 

said seobnd server; 

wherein said load balancing module determines wheflier the first swver 

should service said request based, at least, on the quality of service metrics 
obtained from the first and second servers, the number of pending requests at 
.10 the first s«ver, the numbetxepresenting the tune required to service said 
peairng requests at the firist seryer, the number of pendhig requests at the 
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second server, and the number representing the time required to service said 
pending requests at the second server. 

15. . The device of claim 9, further comprising: 

a first communications interface to said first server for coupling said 
load balancing module to said first server; . 

a second communications interface to said second server for coupling 
5 said load balancing module to said second server; and 

a third communications interface reserved for the dynamic addition of a 
third server for coupling said load balancing module to said third serv^. 

16. The device of claim 9, farther comprising: 

a second load balancing module reserved for dynamically recognizing a 
second.service. 

17. A system for receiving and servicing a request firom a client 
computing station for a service in a network, comprising: 

a first server adapted to service the request;. 

a second server adapted to service the request; and . . 
5 a device for determining if the request should be processed by the first 

. ; server or the second server, said device comprising: 

a front end module for receiving the request and translating the 

request into a transparent message format; 
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a coordinating module for determining if tibe first server and 
10 second servers are active; and 

at least one load balancing module, in communications with said 
first and second servers, for determining whether the first server should 
service said request, and if so,, passing the request to the first server. 

18. The .system of claim 17, wherein said first server has an input 
. queue for trackmg the pending requests to be processed by the first server. 

19. The system of claim 17, wherein said first server maintains a list 
of pending requests and a number corresponding to the time for completing . 
each of the pending requests. 

20. the system of claim 1 7, fiirther comprising a quality of service 
agent operating at the client; 

a quality of service agent operating on said first server adapted to 
communicate with the quality of service agent operating at the client, and 
5 adapted to communicate with said load balancmg module with a message 

containing data of the quality of service between the client and the first server, 
and 

a quality of service agent operating on said second server adapted to 
communicate with the quality of service agent operating at the client, and 
10 adapted to communicate with said load balancing module with a message 
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containing data of the qudi^ of service between the clieat and the second 
servCT. • ' 

21 , Tlie system of claim 20, wherein said load balancing module 
determines whether the first server should service said request based in part 
the dataof the quaUty of service between the client and the first server and the 
data of the quality of service between the client and the second server. 
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Name and maifiiig address of the LSA/US 
Comnussioher ofFHteixb end IVadwiiarics 

Box per 

WasbingtoD, D.CX 20231 
Facsimile No. (703)305*3230 



Pate of maiEag of the ImranarifflMl seaich leport 

umim 



AiitfaLon2)Bd,o£Goer 

GLENTDN BURGESS 
Teieiriione No. (703) 305-4792 



Fsan PCr/ISA/210 (se^ dieet) (July 1998) * 



